Vectorized UDFs in Column-Stores

نویسندگان

  • Mark Raasveldt
  • Hannes Mühleisen
چکیده

Data Scientists rely on vector-based scripting languages such as R, Python and MATLAB to perform ad-hoc data analysis on potentially large data sets. When facing large data sets, they are only efficient when data is processed using vectorized or bulk operations. At the same time, overwhelming volume and variety of data as well as parsing overhead suggests that the use of specialized analytical data management systems would be beneficial. Data might also already be stored in a database. Efficient execution of data analysis programs such as data mining directly inside a database greatly improves analysis efficiency. We investigate how these vector-based languages can be efficiently integrated in the processing model of operator– at–a–time databases. We present MonetDB/Python, a new system that combines the open-source database MonetDB with the vector-based language Python. In our evaluation, we demonstrate efficiency gains of orders of magnitude.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Integration of Machine Learning Into Column Stores

We leverage vectorized User-De�ned Functions (UDFs) to e�ciently integrate unchanged machine learning pipelines into an analytical data management system. The entire pipelines including data, models, parameters and evaluation outcomes are stored and executed inside the database system. Experiments using our MonetDB/Python UDFs show greatly improved performance due to reduced data movement and p...

متن کامل

How Achaeans Would Construct Columns in Troy

Column stores are becoming popular with data analytics in modern enterprises. However, traditionally, database vendors offer column stores as a different database product all together. As a result there is an all-or-none situation for column store features. To bridge the gap, a recent effort introduced column store functionality in SQL server (a row store) by making deep seated changes in the d...

متن کامل

Vectorwise: Beyond Column Stores

This paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-s...

متن کامل

Don't Keep My UDFs Hostage - Exporting UDFs For Debugging Purposes

User-defined functions (UDFs) are an integral part of performing indatabase analytics. Executing data analysis inside a database provides significant improvements over traditional methods, such as close-to-the-data execution, low conversion overhead and automatic parallelization. However, UDFs have poor support for debugging. Since they are executed from within the database process, traditional...

متن کامل

Column Stores for Wide and Sparse Data

While it is generally accepted that data warehouses and OLAP workloads are excellent applications for column-stores, this paper speculates that column-stores may well be suited for additional applications. In particular we observe that column-stores do not see a performance degradation when storing extremely wide tables, and column-stores handle sparse data very well. These two properties lead ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016